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br 1 Abstract 



•^T It is well known that, using fast algorithms for polynomial multiplication and 

— ^ division, evaluation of a polynomial F € C[x] of degree n at n complex- valued points 

can be done with O(n) exact held operations in C, where O(-) means that we omit 

polylogarithmic factors. We complement this result by an analysis of approximate 

multipoint evaluation of F to a precision of L bits after the binary point and prove a 

^h bit complexity of 0(n(L + r + riT)), where 2 T and 2 r , with r, T <G N>i, are bounds 

on the magnitude of the coefficients of F and the evaluation points, respectively. In 

r/^ particular, in the important case where the precision demand dominates the other 

O input parameters, the complexity is soft-linear in n and L. 

Our result on approximate multipoint evaluation has some interesting conse- 

^_^ qucnccs on the bit complexity of three further approximation algorithms which 

^- all use polynomial evaluation as a key subroutine. This comprises an algorithm to 

approximate the real roots of a polynomial, an algorithm for polynomial interpolation, 

and a method for computing a Taylor shift of a polynomial. For all of the latter 

algorithms, we derive near optimal running times. 
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1 Introduction 

We study the problem of approximately evaluating a polynomial F G C[x] of degree n 
at n points xi, . . . , x n e C. More precisely, we assume the existence of an oracle which 
provides arbitrarily good approximations of the polynomial's coefficients as well as of the 
points Xi for free; that is, querying the oracle is as expensive as reading the approximation. 
Under this assumption, we aim to compute approximations g/j of yi := F{xi) such that 
\]Ji — V~i\ < 2 for all i = 1, . . . , n, where L £ N is a given non- negative integer. In what 
follows, let 2 T and 2 r with r, T G N>i be upper bounds for the absolute values of the 
coefficients of F and the points Xj, respectively. 

When considering a sequential approach, where each yi ss F(xi) is computed inde- 
pendently by using Horner's Scheme and approximate but certified interval arithmetic 
[9], we need 0(n) arithmetic operations with a precision of 0(L + r + nT) for each of 
the points x^. Thus, the total cost for all evaluations is bounded by 0(n 2 (L + r + nT)) 
bit operations. 1 

In this paper, we show that using an approximate variant of the classical fast multipoint 
evaluation scheme [12, 7], we can improve upon the latter bound by a factor of n to achieve 
0(n(L + r + nT)) bit operations. The classical fast multipoint evaluation algorithm 
reduces polynomial evaluation at n points to successive polynomial multiplications and 
divisions which are all balanced with respect to degree. It is a well known fact that, 
for exactly computing all values yi, it uses only 0{n log 2 n) exact field operations in C 
compared to 0(n 2 ) field operations if all evaluations are carried out independently; see 
Section 2 for a short review. This method has mostly been studied for low precisions, 
in particular for its performance with machine floating point numbers; see, e.g., [2, 
Section 2] or the extensive discussion in [11]. It is widely considered to be numerically 
unstable, mainly due to the need of polynomial divisions, and the precision demand for 
the sequential evaluations based on Horner's scheme does not directly carry over. 

In previous work (e.g., [13, 14]), more involved algorithms for fast approximate 
multipoint evaluation have been introduced that allow to decrease the total number of 
(infinite precision) arithmetic operations from 0(nlog n) to 0(n log n) (if n dominates 
all other input parameters). The authors mainly focus on the arithmetic complexity of 
their algorithms, and thus no bound on the bit complexity is given. For the special case 
where the points Xi are the roots of unity, the problem can be solved with 0(n(r-\-L)) bit 
operations by carrying out the fast Fourier transform with approximate arithmetic [18, 
Theorem 8.3]. However, for general points, we are not aware of any bit complexity result 
which considerably improves upon the bound 0(n 2 (L + T + nT)) that one directly obtains 
from carrying out all evaluations independently. 

The main contribution of this paper is to show that the previously claimed issue 
of numerical instability within the classical fast multipoint evaluation scheme can be 
resolved. The crux of our approach is best described as follows: First, we exploit the 
fact that all divisors in the considered polynomial divisions are monic polynomials 
9i,j(x) = (x — S(j_i).2hi) ■ ■ ■ (x — Xj. 2 i), with i from 1 to logre — 1 and j from 1 to n/2 % , 
which allow a numerical stable division, at least if the precision L dominates the values n 



1 means that polylogarithmic factors are ignored. 



and V; see Corollary 9 for a precise statement. Second, we consider a numerical division 
algorithm from Schonhage [16] which yields an output precision of L bits after the binary 
point if the algorithm runs with an internal precision of 2L. However, we aim to stress 
the fact (as proven in Section 2.2) that, for the input of the division algorithm, it suffices 
to consider an approximation of only L bits after the binary point. This turns out to be 
crucial in the multipoint evaluation algorithm where we have to consider a number of 
logn successive divisions, and thus the propagated error stays within ~ 2 compared 
to « 2~ L / n . 

Our result has an interesting consequence on the bit complexity of other approximation 
algorithms which all use polynomial evaluation as a key subroutine. Most algorithms 
for computing approximations of the real roots of a square-free polynomial F G R[x], 
with n '■= deg F, consider polynomial evaluation of F at a constant number of points in 
some isolating interval I (e.g., at its endpoints) that contains exactly one root of F. In 
Section 3.1, we combine our fast method for approximate multipoint evaluation with a 
recently introduced algorithm for real root approximation, called Aqir [9], to show that 
the bit complexity for computing L-bit approximations of all real roots of F improves 
by a factor of n from 0(n 2 L) to 0{nL) if L dominates parameters that only depend on 
F (e.g., the separation of its roots). The latter result mainly stems from the fact that, 
instead of considering the refinements of each of the isolating intervals independently, we 
may carry out all evaluations of F in parallel. 2 

Another important application of multipoint evaluation is polynomial interpolation. 
For given points xi,...,x n G C and corresponding interpolation values v±, . . . ,v n , there 
exists a unique polynomial F G C[x] of degree less than n such that F{xi) = vi for all i. 
Based on our approach for fast multipoint evaluation, we prove that computing an L-bit 
approximation F of F (i.e., \\F — F\\\ < 2 ) uses only 0(nL) bit operations (for L 
dominating n and the bitsizes of the Xj's and v^s). Our more general complexity bound 
as stated in Section 3.2 also involves the absolute values of the points X{ and the values 
vi as well as the geometric location of the x^s. 

Finally, we combine fast approximate multipoint evaluation and approximate interpo- 
lation in order to derive an alternative method to [16, Theorem 8.4] for computing an L-bit 
approximation of a Taylor shift of a polynomial F (i.e., the polynomial F m (x) := F(m+x) 
for some m G C) with 0(nL) bit operations (again, for L dominating). The details are 
given in Section 3.3. 



2 Very recent work [19] introduces an alternative method for real root refinement which, for the task of 
refining a single isolating interval, achieves comparable running times as Aqir. In a preliminary version 
of their conference paper (which has been sent by the authors to M. Sagraloff in April 2013), the authors 
claim that using approximate multipoint evaluation also yields an improvement by a factor n for their 
method. Given the results from this paper, this seems to be correct, however, their version of the paper 
did not contain a rigorous argument to bound the precision demand for the fast multipoint evaluation. 



2 Approximate Polynomial Multipoint Evaluation 

Given a polynomial F(x) = Y17=o f iX% ^ ^[x] of degree n, complex points x\, . . . , x n G C, 
and a non-negative integer L G N, our goal is to compute approximations j)j for j/j := F{xj) 
such that |yj — 2/j| < 2~ L for all j = 1, ... , n. Furthermore, let 2 T and 2 r , with r, T G N>i, 
denote bounds on the absolute values of the coefficients of F and the points Xj , respectively. 
For the sake of simplicity, assume that n = 2 k is a power of two; otherwise, pad / 
with zeros. We require that arbitrary good approximations of the coefficients fi and the 
points Xj are provided by an oracle for the cost of reading the approximations. That 
is, asking for an approximation of F and the points Xj to a precision of £ bits after the 
binary point takes 0(n(r + T + £)) bit operations. 

Algorithm 1 (Multipoint evaluation). We will follow the classical divide-and-conquer 
method for fast polynomial multipoint evaluation: 

1. Starting with the linear factors goj(x) := x — Xj, we recursively compute the 
subproduct tree 

9i,j{x) ■■= (X - SB(,-_i)2*+l) • • • (x ~ X j2 i) = gi-l,2j-l(x) ■ 9i-l,2j{x) (1) 

for i from 1 to k — 1 and j from 1 to n/2* = 2 , that is, going up from the leaves. 
Notice that deg^j = 2 l . 

2. Starting with r^i(x) := F{x), we recursively compute the remainder tree 

n,j{x) ■= F(x) mod gi,j(x) = rj+ijj/2] (x) mod gij{x) 

for i from k — 1 to and j from 1 to n/2 l = 2 k ~ l , that is, going down from the 
root. Notice that degrjj < 2\ 

3. Observe that the value at point Xj is exactly the remainder 

tqj = F(x) mod go,j(x) = F(x) mod (x — Xj) = F(xj) G C. 

It is well known that this scheme requires a total number of 0(/i(n) log n) arith- 
metic operations in C (e.g., see [3, Chapter 1, Section 4] or [7, Corollary 10.8]), where 
n(n) denotes the arithmetic complexity of multiplying two polynomials of degree n or, 
equivalently, the bit complexity of multiplying two n-bit integers. Hence, using an asymp- 
totically fast multiplication method with soft-linear bit complexity such as the algorithms 
by De et al. [4] or Fiirer [5] yields a soft-linear arithmetic complexity for polynomial 
multipoint evaluation. However, we are mainly interested in the bit complexity of the 
above algorithm if the multiplications and divisions are carried out with approximate 
but certified arithmetic such that an output precision of L bits after the binary point 
can be guaranteed. Fast polynomial division is widely considered to be numerically 
instable which explains why a result on the bit complexity of approximate polynomial 
evaluation is still missing. We will close this gap by using a method from Schonhage for 
numerical polynomial division based on a direct application of discrete Fourier transforms 
to minimize the number of numerically unstable operations; see Section 2.2. 



2.1 Fast Approximate Polynomial Multiplication 

Definition 2 (Polynomial approximation). Let ||-|| be a norm on the set of complex 
polynomials considered as a vector space over C. For a polynomial / = Ya=o a i x% ^ <C[x] 
and an integer £, a polynomial / £ C[x] is called an (absolute) £-bit approximation of f 
w.r.t. || -|| if ||/-/ 1| < 1~ l . Alternatively, if / = / + A/, this is equivalent to ||A/|| < 2~ e . 
When not mentioned explicitly, we assume the norm to be the 1- or sum- norm ||-||i 

with n/iii = e™=oN- 

The definition of an (absolute) polynomial approximation does not take into account 
the degree. Typically, degree loss arises when approximating a polynomial with very 
small leading coefficients which may be truncated to zero. However, the definition also 
allows for a higher (but finite) degree of the approximation. 

We further remark that any ^-bit ||-||i-approximation of a polynomial implies an ^-bit 
approximation of each coefficient or, in other words, an ^-bit approximation w.r.t. the oo- 
or maximum-norm ||/||oo = maxj|aj|. Conversely, any coefficient-wise approximation / 
on / to £ + log(n + 1) bits, with h = deg/, constitutes an ^-bit ||-||i-approximation of /. 

The reason why we favor the sum-norm is its sub-multiplicativity property, that is, 
for f,g€C[x], we have 

||/-<7||i<ll/l|i-||<7||i- (2) 

In practice, we also require the precision of the coefficients to be not too high in 
order to avoid costly arithmetic with superfluous accuracy. In the light of the preceding 
comment, we can assume that the coefficients are represented by dyadic values with less 
than £ + log(n + 1) + c bits after the binary point for some small constant c. 

Definition 3 (Integer truncation). For a complex number z = a + \b G C, a Gaussian 
integer z = a + ib S Z[i] is called an integer truncation of z if \z — z\ < 1. An integer 
truncation / € Z[i][x] of a polynomial / G C[x] is defined coefficient- wise. 

In what follows, we ignore the fact that there are several truncations for a complex 
number and, for the sake of simplicity, pretend that we can compute "the" truncation 
of / and denote it by trunc(/). This is reasonable since, for any £ > 1, coefficient-wise 
rounding of any £-bit || -Hoc -approximation of / yields a unique truncation (although not 
necessarily the same for different £). 

Theorem 4 (Numerical multiplication of polynomials). Let / £ C[i] and g 6 C[a;] 
be polynomials of degree less than or equal to n and with coefficients of modulus less than 
2 b for some integer b > 1. Then, computing an £-bit \\-\\\- approximation h for the product 
h := f ■ g is possible in 

0{fx{n(£ + b + 2logn))) or 0{n{£ + b)) 
bit operations and with a precision demand of at most 

^ + 6 + 2[log(n + l)] +3 or £ + 0{b + logn) 

bits on each the coefficients of f and g. 



Proof. Let s := £ + b + 2|"log(n + 1)] + 2. Define F := 2 s / and G := 2 s g, and notice that 
H := F G = 2 2s h. We consider polynomials F := trunc(F) and G := trunc(G) G ^[i][^] 
and write AF := F - F and AG := G - G. Since ||AF||i, ||AG||i < n + 1, 

||FG-FG||i < ||AF • G||i + ||F • AG||i + ||AF • AG||i 

< ||AF||i • ||G||i + ||F||i • ||AG||i + ||AF||i • ||AG||i 

< (n + l) 2 2 s+fe + (n + l) 2 2 s + 6 + (n + l) 2 
<(n + l) 2 -2 s+6 + 2 

holds. For h := 2" 2s F G, it follows that 

[|A - h\\! < 2~ 2s (n + l) 2 • 2 s+b + 2 < 2 6 + 21o s(™ +1 )+ 2 - s < 2~ e , 

hence an ^-bit-approximation as required can be recovered from the exact product of 
F and G by mere bitshifts. Since HFHoo, UGHoo < 2 S+ , multiplication of F and G can 
be carried out exactly in 0(//((s + b)n)) bit operations. This proves the complexity 
result. For the precision requirement, notice that HFHoo, ||G||oo < 2 s+b , and thus we need 
(s + b + |~log(ra + 1)] + 3)-bit H-Hoo-approximations of / and g to compute F and G. D 

2.2 Fast Approximate Polynomial Division 

Definition 5 (Numerical division of polynomials). Given a dividend / € C[x], a 
divisor g G C[x], and an integer £ > 1, the task of numerical division of polynomials is to 
compute polynomials Q € C[x] and fi £ C[i] satisfying 

\\f-(Q. g + R)\\ 1 <2- i 

with degQ < deg/ — degg and degi? < deg^. 

Theorem 6. (Schonhage [16, Theorem 4-1]) Let f £ C[x] 6e a polynomial of degree 
< 2n and with norm \\f\\i < 1, and let g £ C[x] be a polynomial of degree n with norm 
1 < llfflli < 2. Suppose that a bound 2 P , with p G N>i, on i/te modulus of all roots of g is 
given. Then, numerical division of f by g up to an error of 2 needs a number of bit 
operations bounded by 

0{p{n{£ + np))) = 0{n(£ + np)). 

In his presentation of the division algorithm, Schonhage carefully analyses the required 
precision for the needed operations in his algorithm as 2 • £ + 5np + 0(n) bits; see [16, 
(4.14) and (4.15)]. Hence, one might conclude that this bound also expresses the precision 
demand on the input polynomials / and g. However, the factor 2 in the above bound is 
only needed for the precision with which the internal computations have to be performed, 
whereas it is not necessary for the precision demand of the input polynomials / and 
g. In particular, for £ ^> np, input and output accuracy are asymptotically identical, 
independently from the algorithm used to carry out the numerical division. For a proof of 
the above claim, we need an additional result from Schonhage which provides a worst-case 
perturbation bound for polynomial zeros under perturbation of its coefficients. 



Theorem 7. (Schonhage [17, Theorem 2.7]) Let f £ C[x] be a polynomial of degree n 
with zeros x%, . . . ,x n , not necessarily distinct, and let f be a log(r/ ||/||i) -approximation of 
f for 7] < 2~ 7n . Then, the zeros x\, ■ ■ ■ , x n of f can be numbered such that \xj — Xj\ < 9 tfrj 
for \xj\ < 1 and \1/xj — 1/xj\ < 9^/r/ for \xj\ > l. 3 

We can now give a stronger version of Theorem 6 which comprises the claimed bound 
on the needed input precision. In addition, we show that, within a comparable time 
bound as given in Theorem 6, we can guarantee that the computed polynomials Q and 
R are £-bit approximations of their exact counterparts. 

Theorem 8. Let /, g and p as in Theorem 6, and Q := / div g and R := / mod g be the 
exact quotient and remainder in the polynomial division of f by g. 

Then, the cost for computing l-bit approximations Q and R of Q and R satisfying 
\\f ~ (Q"fl' + -R))||i < 2 * s bounded by 0(n(£ + np)) bit operations. For this computation, 
we need {£ + 32np)-bit approximations of the polynomials f and g. 

Proof. Let / = / + A/ and g = g + Ag be arbitrary if- and 4,-bit approximations for 
/ = Yjilofi xl and 9 = YJi=o9ix\ where deg/ < 2n, deg<? < deg#, and If and £ g are 
integers to be specified later. 

First, we note that deg g and deg g actually coincide for any £ g > n(p+2) + 1. Namely, 
there exists at least one coefficient g% of g with \gi\ > l/(n + 1) > 2~ n since \\g\\\ > 1, and 
thus \g n \ > \gi\ ■ 2~ n ~ np > 2~ n ( p+2 \ where the second to last inequality follows from the 
fact that |<7j| < \g n \ ■ 2 n Y\ z:g ^ =0 max(l, \z\) < \g n \ • 2 n+np . Hence, in particular, we have 



\9n\ 



\g n \ > 2 -«^+ 2 )- 1 > 2- 4np for all £ g > n{p + 2) + 1. 



Next, we derive a necessary condition on the precision £ g such that 2 2p is a root 
bound for g. Suppose that £ g > max(n(p + 2) + 1, 7n + 1), then we may apply Theorem 7 
to the polynomials g and g which have the same degrees as shown above. For x and x 
an arbitrary corresponding pair of roots of the polynomials g and g, we distinguish two 
cases: 

1. If |x| < 1, it immediately follows \x\ < \x\ + 9 \j2~ iL a and, hence, \x\ < 2 P + 1 < 2 2p . 

2. For x outside the unit circle, we have |x|/|x| > 1 — 9 v2~ i s\x\. Thus, we aim for 
9 1 if2r r a2 p < 1/2 which is fulfilled if £ g > n{p + 5) > n log 18 + rip. 

In what follows, we assume that £ g > max(7n + 1, n(p + 5)). This ensures that the degrees 
of g and g coincide and 2 P , with p := 2p, constitutes an upper bound on the absolute 
value of all roots of g as well as for all roots of g. 

Suppose that f = Q ■ g + R and f = Q ■ g + R are the exact representations of / and 
/ after division with remainder, then we aim to show that the pair (Q, R) is actually a 
good approximation of (Q,R) (i.e., « mm(£f,£ g )-bit approximations) if £f and £ g are 

3 Sch6nhage points out that the theorem also holds for zeros at infinity, that is, in the case where 
deg/ t^ deg/. However, in our applications, the degrees will always be the same. 



both large enough. Write AQ := Q — Q and AR := R — R. The coefficients Qk of Q 
appear as leading coefficients in the Laurent series of the function 

f{x)/x n _ f 2n + hn-l/x + f2n-2/x 2 H _ , Qn-1 , Qn-2 , 

9{x) gn + 9n~l/x + g n -2/X Z H X X 2 

and can be represented, using Cauchy's integral formula, as 

Qk = ±[ f -^fx k -' dx (3) 

for any g > 2 P ; see [16, (4.7)— (4.9)]. Using the corresponding representation of the 
coefficients Qk of Q, we can estimate (here, for any g > 2 P ) 



1 



2tt 



A/(x)- 9 (x)-/(x)-A 5 (x) , n _! 



|Q*-Qfc| = ;r- / -^^^yVfV ^x^ 1 ^. (4) 



|i|=e 



g(x)g(x) 



Throughout the following considerations, we fix g := 2 P + 1 = 2 2p + 1 < 2 3p . The absolute 
value of the numerator of the integrand is bounded by 

(\Af(x)-g(x)\ + \f(x).Ag(x)\)-\x\ k - n - 1 

< (|| A/||i £ 2n • || 5 ||i ^ + H/lli e 2n • IIA^IIx e ») • g k - n - 1 

< (2- £ / +1 + 2~^ +1 ) £) 2n + fc - 1 < (2- £ / +1 + 2^ +1 ) p 2n+fc 

and, for the denominator, we have 

\g(x)g(x)\ > \g n \(g-2 p ) n - \g n \(g-2 p ) n > \g n \ ■ |S»| > 2~ 8 ^. 
Now, using the latter two estimates in (4) yields 

|AQ fc | = \Q k - Q k \ < (2 l - e * + 2"*») • 2 8n " • ^ +fc . 
Summing over all k = 0, . . . , n gives 

||AQ||i = ||Q - g||i < (2~^ +1 + 2^ +1 ) • 2 8n ^ • g 



n+l _ i 
„2nt? x 



0-1 

< f2~^"'" 1 + 2~^ 9+1 ) • 2 8n, ° • o 3n+1 

< (2~ £/+1 + 2~^ s+1 ) • 2 20np < 2 _min ( £ /' £ 9) +20n ' ,+2 

where we used that g = 2 2p + 1 < 2 2p+1 and thus g 3n+1 < 2 12np . Hence, for 

£ f , £ g >£ + 20np + 4, (5) 

the polynomial Q is an (£ + 2)-bit approximation of Q. An analogous computation as 
above based on the formula (3) further shows that 

II Q 111 <2 inp -g 2n+l <2 13np . (6) 



Hence, under the above constraints from (5) for if and i g , we conclude that 

\\R-R\\i = \\(f-Qg)-(f-Qg)\\i 

< IIA/IK + HQIIillAsllx + HAQllxllslli + ||AQ||i||A 5 ||i 

< 2~ £ f + 2 13np • 2 _/ « + 2" £ ~ 2 • 2 + 2~^ 2 • 2 -/ * 

< 2- £ - 3 + 2- £ - 3 + 2"'" 1 + 2- £ - 3 < 2~\ 

thus ij constitutes an £-bit approximation of R. 

We are now in the position to put the pieces together and prove the main statements of 
the theorem. For i := £ + 2>2np > (£-\-3i) + 20rap + 4, 4 we first choose i-bit approximations 
/ and g of / and g, respectively, such that ||/||i < 1 and 1 < \\g\\i < 2. 5 We can now 
apply Theorem 6 to compute polynomials Q and R such that ||/ — (Q ■ g + R)\\\ < 2~^. 
For this step, we need 0(n(£ + np)) bit operations. We define / := Q • g + R and ^ := g, 
where the latter two polynomials are £-bit approximations of / and g, respectively. Thus, 
the above consideration shows that Q and R are (£ + 3)-bit approximations of the exact 
solutions Q and R, respectively. It follows that 

||/ - (Qg + R)\\i < \\(Q - Q) ■ g\\ x + \\R - R\\ x 

<\\Q-Q\\i-\\g\\i + 2-^<2- £ 

holds, completing the proof. D 

Notice that the above result shows that the precision demand for the input polynomials 
is of the same size as the desired output precision plus a term which only depends on 
fixed parameters, that is, n and p. This will turn out to be crucial when considering 
numerical division within the multipoint evaluation algorithm. Namely, since we have 
to perform logn successive divisions, a precision demand of 2 ■ £ (as needed for the 
internal computations in Schonhage's algorithm) for the input in each iteration would 
eventually propagate to a precision demand of n£, which is undesirable. However, from 
the above theorem, we conclude that, for an output precision of £, an input precision of 
£ + 0((logn) • np) is sufficient because, in each of the logn successive divisions, we loose 
a precision of 0{np). In order to give more precise results (and rigorous arguments), we 
first have to make Theorem 8 applicable to polynomials with higher norm as they appear 
in the multipoint evaluation scheme. With this task in mind, we concentrate on the case 
of monic divisors g. 

Corollary 9. Let / £ C[i] be a complex polynomial of degree < 2n with ||/||i < 2 b for 
some integer b > 1, and let g be a monic polynomial of degree n with a given root bound 
2 P , where p £ N>i. Let Q := / div g and R := f mod g denote the exact quotient and 
remainder in the polynomial division of f by g. 



In fact, it suffices to choose I := £ + 27np, however, we aimed for "nice numbers." 
5 Notice that this can always be achieved since we can always choose approximations of / and g which 
decrease or increase the corresponding 1-norms by less than 2~ £ < 1/2. 



Then, the cost for computing t-bit approximations Q and R of Q and R, respectively, 
with ||/ — (Q ■ g + .R))||i < 2~ e is bounded by 

0(n(£ + b + np)) 

bit operations. For this computation, we need (I + b + n(2p + 2|~log2n] + 32))-bit 
approximations of the polynomials f and g. The approximate remainder R fulfills 

II Rll <C ol6n+2np+2nlogf2n]+6 nb+2np+0(n\ogn) /n\ 

Proof. Let s := p + [log 2ra] and £* := £ + b + ns. We define 

/*(x) := 2- b - 2ns f(2 s x) and 5 *(x) := 2~ ns g(2 s x). 

It follows that ||/* ||i < 1 and 1 < ||g*||i since g* is again monic. In addition, the 
scaling of g by 2 s yields that all roots of g* have absolute values less than or equal 
to l/(2n). Thus, the i-th coefficient of g* is bounded by (™)t2~t« which shows that 

115*111 < (1 + w~) n < e < 2. We now apply Theorem 8 to the polynomials /* and 
g* , and to some desired output precision t which will be specified later: Suppose that 
Q* := /* divg* and R* := /* mod g* , it takes 0(n(t + n)) bit operations to compute t- 
bit operations Q* and i?* of Q* and i?*, respectively, such that ||/* — (Q*g + R*)\\i < 2~ e *. 
For this, we need {I* + 32n)-bit operations of the polynomials /* and g*. Notice that 
we used the fact that 2 1 constitutes a root bound for g*. We further remark that, in 
order to compute the approximations for /* and g* , it suffices to consider (£* + 32n)-bit 
approximations of the polynomials / and g. In order to recover approximations for the 
polynomials Q and R, we consider an inverse scaling, that is, 

Q{x) := 2 b+ns Q*(2- s x) and R(x) := 2 2ns+b R*{2- s x). 

Since f(x) = Q(x) ■ g{x) + R(x), we have 

2~ 2ns - b f(2 s x) = 2- b ~ ns Q(2 s x) ■ 2~ ns g(2 s x) + 2~ 2ns - b R(2 S x) , 

S v ' V v ' N v ' N v ' 

/* Q* g* R* 

and, thus, for any I* > b + 2ns, the polynomials Q(x) and R(x) are (£* — b — 2ns)- 
approximations of Q and R, respectively. In addition, ||/ — (Qg + R)\\i < 2~ e +&+2ns_ 
Hence, for £* := £ + b + 2ns, the bound on the bit complexity of the numerical division 
as well as the bound on the precision demand follows. 

For the estimate on [|JR[|i, we recall that (6) yields ||(J*||i < 2 13n which implies that 
||-R*||i < ll/lli + ||Q*||i • \\g*h < 1 + 2 ■ 2 13n < 2 16n . Thus, we have 

II nil ^ o2ns+6|| c>* || -^ nl6n+2ns+b ^ ol6n+2np+2nlog[2n]+fe r\b+2np+0{n\ogn) 

and the same bound also applies to R since it is an £-bit approximation of R. □ 



2.3 Fast Approximate Multipoint Evaluation: Complexity Analysis 

We can now apply the results of the previous two sections to the recursive divide- 
and-conquer multipoint evaluation scheme as described on page 3. Using approximate 
multiplications and divisions, our goal is to compute approximations foj of the final 
remainders roj = F mod (x — Xj) = F(xj) such that \foj — F{xj)\ < 2~ L for all 
j = 1, . . . , n. In other words, we aim to compute L-bit approximations of the remainders 
ro.j. For this purpose, we will do a bottom-up backwards analysis of the required precisions 
for dividend and divisor in every layer of the remainder tree which will yield the according 
requirements on the accuracy of the subproduct tree. 

Theorem 10. Let F £ C[x] be a polynomial of degree n with \\F\\i < 2 r , with r > 1, and 
let xi, . . . ,x n £ C be complex points with absolute values bounded by 2 r , where T > 1. 
Then, approximate multipoint evaluation up to a precision o/2 for some integer L > 0, 
that is, computing yj such that \yj — F(xj)\ < 2~ L for all j, is possible with 

0{n{L + T + nT)). 

bit operations. Moreover, the precision demand on F and the points Xj is bounded by 
L + 0{t + nr + n log n) bits. 

Proof. Define gij and rij as in Algorithm 1. We analyse a run of the algorithm using 
approximate multiplication and division, with a precision of £f lv for the approximate 
divisors g^* and remainders f^ * in the i-th. layer of the subproduct and the remainder 
tree. We recall that deg^,* = deggi,* = 2\ 

According to Corollary 9, for the recursive divisions to yield an output precision ^ > 0, 
it suffices to have approximations fj+i* and g-i^ of the exact polynomials / := rj+i : * and 
9 '■= 9i,* to a precision of 

3+i : = ^ + log[K+i,*||i + 2 i+1 r + 0(i ■ 2 l ) (8) 

bits, since the roots of each g^ are contained in the set {xi, . . . ,x n } and, thus, their 
absolute values are also bounded by 2 r . In addition, it holds that ||riogn,o||i = ll-^lli < 2 T . 
In order to bound the absolute values of the remainders r^* for i < log n, we can use our 
remainder bound from (7) in an iterative manner to show that 

log|K*||i = log||r m ,*||i + 2 i+1 r + 0(i ■2 i ) = r + 2nT + O(nlogra). (9) 

Combining (8) and (9) then yields 

£ div := max if v = £$ v + T + 2nT + 0(n log n). 

i>0 

Hence, choosing £q := L, we eventually achieve evaluation up to an error of 2 
if all numerical divisions are carried out with precision £ dlv . The bit complexity to 
carry out a single numerical division at the i-th layer of the tree is then bounded by 
0{2 i (£ div + t + 2T)) = 0(2'(L + rar + r)). Since there are n/2* divisions, the total cost 
at the i-th layer is bounded by 0(n(L + nT + r)). The depth of the tree equals logn, 
and thus the overall bit complexity is 0(n(L + nT + r)). 
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It remains to bound the precision demand and, hence, the cost for computing 
(L + r + 2nT + 0(n log n))-bit approximations of the polynomials gi^. According to 
Theorem 4, in order to compute the polynomials <ft * to a precision of if 1 , we have to 
consider If^-bit approximations of gi-i t *, where 

C ul := Ci + 21og|| 5i _i, + ||i + 0(i) = CT + 2ir + 0(i) = C ul + 0(logn • T). 

Hence, it suffices to run all multiplications in the product tree with a precision of 
^mui _ £ _|_ T _|_ o^ n Y + nlogn). The bit complexity for all multiplications is bounded by 
0(n£ mnl ) = 0(n(L + r + nF)), and the precision demand for the points Xi is bounded by 
£ mul + 0(r + logn) = L + r + 0{nT + nlogn). □ 

3 Applications 

3.1 Quadratic Interval Refinement for Roots of a Polynomial 

Polynomial evaluation is the key operation in many algorithms to approximate the real 
roots of a square-free polynomial F(x) £ 7L\x\. Given an isolating interval / = (a, b) for 
a real root £ of F (i.e., I contains £ and I = [a, b] contains no other root of F) and an 
arbitrary positive integer L, we aim to compute an approximation of £ to L bits after 
the binary point (or, in other words, an L-bit approximation of £) by means of refining / 
to a width of 2~ L or less. 

A very simple method to achieve this goal is to perform a binary search for the root 
£. That is, in the j-th iteration (starting with Iq := (ao, bo) = (a, b) in the 0-th iteration), 
we split the interval Ij = (aj,bj) at its midpoint rn(Ij) into two equally sized intervals 
I'a = (aj,m(Ij)) and I'J = (m(Ij),bj). We then check which of the latter two intervals 
yields a sign change of F at its endpoints, 6 and define Ij+\ to be the unique such interval. 
If F(m(Ij)) = 0, we can stop because, in this special case, we have exactly computed the 
root £. The main drawback of this simple approach is that only linear convergence can 
be achieved. 

In [1], Abbott introduced a method, denoted quadratic interval refinement (Qir), to 
overcome this issue. It is a trial and error approach which combines the bisection method 
and the secant method. More precisely, in each iteration, an additional integer Nj is 
stored (starting with A^o = 4) and (only conceptually) the interval Ij is subdivided into Nj 
equally sized subintervals I^i, . . . , Ij t Nj- The graph of / restricted to Ij is approximated 
by the secant S passing through the points (aj,F(a,j)) and (bj,F(bj)). The idea is that, 
for Ij small enough, the intersection point xs of S and the real axis is a considerably 
good approximation of the root £, and thus the root £ is likely to be located in the same 
of the Nj subintervals as xs- Hence, we compute xs and consider the unique subinterval 
Ijl, with i G {1, . . . , Nj}, which contains x$- If Ij,£ yields a sign change of F at its 
endpoints, we know that it contains £ and, thus, proceed with Ij+\ '■= Ij/. In addition, we 
set Nj + i := N? This is called a successful Qir step. If we are not successful (i.e., there 
is no sign change of F at the endpoints of Ij/), we perform a bisection step as above and 

6 Here, it is important that / is considered to be square-free. Thus, £ must be a simple root, and any 
isolating interval / = (a, b) for £ yields F(a) ■ F(b) < 0. 
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set Nj+x := min(4, wNj). It has been shown [8] that the QiR method eventually achieves 
quadratic convergence; in particular, all steps are eventually successful. As a consequence, 
the bit complexity for computing an L-bit approximation of £ drops from 0(n 3 L) (using 
the bisection approach) to 0(n 2 L) (for the QiR method) if L is dominating. Namely, 
the number of refinement steps reduces from O(L) to O(logL), and the bit complexity 
in each step is bounded by 0(n 2 L) for both methods (exact polynomial evaluation at a 
rational number of bitsize L). 

In [9, 10], a variant of the QiR method, denoted Aqir, has been proposed. It is 
almost identical to the original QiR method; however, for the sign evaluations and the 
computation of xs, exact polynomial arithmetic over the rationals has been replaced 
by approximate but certified interval arithmetic. Aqir improves upon Qir with regard 
to two main aspects: First, it works for arbitrary real polynomials whose coefficients 
can only be approximated. Second, it allows to run the computations with an almost 
optimal precision in each step which is due to an adaptive precision management and 
the fact that the evaluation points are chosen "away from" the roots of p; see [9, 10] for 
details. In particular, the precision requirement in the worst case drops from 0(nL) to 
O(L) in each step, thus resulting in an overall improvement from 0(n 2 L) to 0(nL) with 
respect to bit complexity. Now, if isolating intervals for all real roots of p are given, then 
computing L-bit approximations of all real roots uses 0(n 2 L) bit operations since we have 
to consider the cost for the refinement of each of the isolating intervals as many times 
as the number of real roots (which is at most n). This is the point, where approximate 
multipoint evaluation comes into play. Namely, instead of considering the evaluations of 
/ for each interval independently, we can perform n many of these evaluations in parallel 
without paying more than a polylogarithmic factor compared to only one evaluation. 
This yields a total bit complexity of 0(nL) for computing L-bit approximations of all 
real roots. We remark that the latter bound is optimal up to logarithmic factors because 
reading the output already needs Q(nL) bit operations. For the special case, where p 
has integer coefficients, we fix the following result: 

Theorem 11. Let F £ Z[x] be a square-free polynomial of degree n with integer coeffi- 
cients bounded by 2 r , and let L be an arbitrary given positive integer. Then, computing 
isolating intervals for all real roots of F of width 2~ L or less uses 0(n 3 T + nL) bit 
operations. 

Proof. Let £i, . . . , £ m denote the real roots of F. We proceed in three steps: 
In the first step, we compute isolating intervals L^, . . . , I^ m for all real roots. 
In the second step, the intervals are refined such that 

wilt. ) < wc, := -5 - — : — p; j z for all k = 1, . . . , m, (10) 

v ikJ 4fc 32ed 3 2 T max{l,|£ fc |} Q! - 1 ' ' ' v ; 

where e ~ 2.71 .. . denotes the Eulerian number. For the latter two steps, we use 
an asymptotically fast real root isolation algorithm, called NewDsc, which has been 
introduced in [15]. The proof of [15, Theorem 10] shows that we need 0(n 3 T) bit operations 
to carry out all necessary computations. 

Finally, we use Aqir to refine the intervals I^ k to a size of 2 or less. Since the 
intervals I^ k fulfill the inequality (10), [9, Corollary 14] yields that each AQiR-step will 
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be successful if we start with Iq := I^ k and Nq := 4. That is, in each of the subsequent 
refinement steps, Ij will be replaced by an interval Ij+i of width w(Ij)/Nj, and we have 
Nj+i = Nj. In other words, we have quadratic convergence right from the beginning and 
never fall back to bisection. According to [10, Lemma 21] and the preceding discussion, 
the needed precision for each polynomial evaluation in the refinement steps is bounded 
by 0(L + nTp + S^ ), where 2 rp denotes a bound on the modulus of all complex roots 
z\, . . . , z n of F, T,p '■= Ya=i l°g <T ( z i)~ 1 ) an d cr(zi) := miiij^i\zi — Zj\ the separation of z%. 
For a polynomial F with integer coefficients of absolute value 2 T or less, we may consider 
IV = 2 T+1 according to Cauchy's root bound, and, in addition, it holds that T,p = 0{tit); 
see [10] and the references therein for details. Thus, the bound on the needed precision 
simplifies to 0(L + tit). In each iteration of the refinement of a single interval Jg fc , we 
have to perform a constant number of polynomial evaluations, 7 hence there are 0{n) 
many evaluations for all intervals. All of the involved evaluation points are located in the 
union of the intervals I^ h , and thus they have absolute value bounded by 2 r . In addition, 
p has coefficients of absolute value bounded by 2 T . Hence, in each iteration, we need 
0(n 2 T + nL) bit operations for all evaluations according to Theorem 10. Since we have 
quadratic convergence for all intervals, there are only O(logL) iterations for each interval, 
hence the claimed bound follows. □ 

3.2 Polynomial Interpolation 

Fast polynomial interpolation can be considered as a direct application of polynomial 
multipoint evaluation. Given n (w.l.o.g. we again assume that n = 2 k is a power of two) 
pairwise distinct interpolation points xi, ■ ■ . , x n G C and corresponding values v\, ■ ■ ■ , v n , 
we aim to compute the unique polynomial F E C[x] of degree less than n such that 
F{xi) = Vi for all i = 1, . . . , n. Using Lagrange interpolation, we have 

n n n n n n 

F( x ) = xn n ^ -x, = ^ vtX ^ ■ n^ - x ^ = z) m ■ n(x - xj), 

where A, := TYj=i- j&(. x i ~ x j) ano ^ ^ := v i ' \~ • Now, in order to compute F(x), we 
proceed in two steps: In the first step, we compute the values Aj. Let g(x) := YYj=i( x ~ x j) 
(notice that g(x) coincides with the polynomial gk,i{x) from (1)), then Aj = g'(xi), and 
thus the values A, can be obtained by a fast multipoint evaluation of the derivative 
g'(x) of the polynomial g{x) at the points X{. We can compute g and g' with 0(n) 
arithmetic operations in C, and, using fast multipoint evaluation, the same bound 
also applies to the number of arithmetic operations to compute all values A«. Hence, 
computing the values \ii takes 0{n) arithmetic operations in C. Now, in order to compute 
F k)1 {x) := F(x) = Yh=i Hi ■ Yl] =1;j ^i(x - Xj), we write 

n/2 ra/2 n n 

Fk,i( x ) =9k-i,i{x) ■ ^2 Hi • Y\( x ~ x j) +9k-iM x ) '^2 ^ ' n^-^j)- ( n ) 

i=l j=l- i=n/2+l j=n/2+l; 

j^i j+i 



i( x ) =:-Pfe-i,2(a) 



7 In fact, there are up to 9 evaluations in each step. See [9, Algorithm 3] for details. 
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Following a divide-and-conquer approach, we can recursively compute F{x) from the 
values fii and the polynomials g%j as defined in (1). It is then straight forward to show that 
0(n) arithmetic operations in C are sufficient to carry out the necessary computations. 
In contrast to the exact computation of F{x) as outlined above, we now focus on 
the problem of computing an L-bit approximation F of F. We assume that arbitrarily 
good approximations of the points x% and the corresponding values V{ are provided. We 
introduce the following definitions: 

r := maxlogmax(2, \xi\) > 1, V := maxlogmax(2, \vi\) > 1, and 

i=l i=l 

n 

A := maxlogmax(l, |Aj|~ ) = maxlogmax(l, I \x% — Xj\ ). (12) 

In Section 2.1, we have already shown that computing £-bit approximations of all 
polynomials gij needs 0(n 2 T + n£) bit operations. Furthermore, we need approximations 
of the points X{ to 0(nT + 1) bits after the binary point. Applying Theorem 10 to the 
derivative g'ix) := g[ k (x) of gi t k(x) and the points xi, . . . , x n then shows that computing 
£-bit approximations Aj of the values Aj uses 0(n 2 T + n£) bit operations since the modulus 
of the points X{ is bounded by 2 r and the coefficients of g' have absolute value of size 
2°(™ r ). The precision demand on g' and the points Xi is bounded by 0(nT + ^) bits after 
the binary point. Now, in order to compute an £-bit approximation fin of fa = V{,\~ , 
we have to approximate Vi and Aj to 0(£ + A + V) bits after the binary point. Hence, 
computing such approximations jli for all i needs 

d(n(l + nT + A + V)) 

bit operations, and the precision demand for the points x^ and the values Vi is bounded 
by 0{£ + nT + A + V) bits after the binary point. For computing F, we now apply the 
recursion from (11). Starting with £-bit approximations of fa and gij, the so-obtained 
polynomial F differs from F by at most i — 0(nT + A + V)-bits after the binary point 
since the coefficients of all occurring polynomials in the intermediate computations have 
modulus bounded by 2°( nr+ +V K Hence, we conclude the following theorem: 

Theorem 12. Let x\, . . . ,x n £ C be arbitrary, but distinct, given interpolation points 
and vi,...,v n S C be arbitrary corresponding interpolation values. Furthermore, let 
F £ C[i] be the unique polynomial of degree less than n such that F(xj) = Vi for all i. 
Then, for any given integer L, we can compute an L-bit approximation F of F with 

0{n{nT + V + A + L)) (13) 

bit operations, where T, V, and A are defined as in (12). The points x\ and the values Vi 
have to approximated to 0(nT + V + A + L) bits after the binary point. 
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Remark 13. In the special case, where x% = e 1 ' « are the n-th roots of unity, we have 
T = 1 and A = logn because Ilfci-j^il^i ~ x j\ = 1 dx ( x »)| = l n ' x ?~ I = n - ^^ e 
bound in (13) then simplifies to 0(n(n + L + V)) which is comparable to the complexity 
bound that one gets from considering an inverse FFT to the vector (v\,... ,v n ) using 
approximate arithmetic [18, Theorem 8.3], regardless of the fact that the latter approach 
is certainly much more reasonable and efficient in practice. 
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3.3 Asymptotically Fast Approximate Taylor Shifts 

Our last application concerns the problem of computing the Taylor shift of a polynomial 
F E C[x] by a given m £ C. More precisely, given oracles for arbitrarily good approxima- 
tions of F and m and a positive integer L, we aim to compute an L-bit approximation of 
F m (x) := F{m + x). Computing the shifted polynomial F m is crucial in many subdivision 
algorithms to compute the roots of a polynomial. Asymptotically fast methods have 
already been studied in [18] and [6], where the computation of the coefficients of F m is 
reduced to a multiplication of two polynomials. We follow a slightly different approach 
based on multipoint evaluation, where the problem is reduced to an evaluation-interpo- 

■ 27TJ 

lation problem. More specifically, we first evaluate F at the n points x% := m + e*'~, 
where n := degF + 1. We then compute F m as the unique polynomial of degree less than 
n which takes the values V{ := p{xf) at the roots of unity Wj := e l '~ . In the preceding 
sections, we have shown how to carry out the latter two computations with an output 
precision of £ bits after the binary point. Theorem 12 and the subsequent remark shows 
that, in order to compute an L-bit approximation of F m , it suffices to run the final 
interpolation with an input precision of 0{n + L + V) bits after the binary point, where 
V = max^ logmax(2, |L(xj)|) = logn2 T (2max(l, |m|))™ = 0(n + r + nlogmax(l, \m\)) 
and Halloo < 2 T . The cost for the interpolation is bounded by 

0{n{n + L + V)) = 0(n(n + L + r + nlogmax(l, |m|))). 

It remains to bound the cost for the evaluation of F at the points Xj. Since we need 
approximations of F(xi) to 0(L + n + r + nlogmax(l, |m|)) bits after the binary point 
and \xi\ < 2max(l, \m\) for all i, Theorem 10 yields the bound 

0(n(n + nlogmax(l, \m\) + r + L)) 

for the number of bit operations to run the approximate multipoint evaluation. The 
polynomial F and the points Xi have to be approximated to 0{n+n log max(l, \m\)+T+L) 
bits after the binary point. We fix the following result which provides a complexity bound 
comparable to [18, Theorem 8.4]: 

Theorem 14. Let F £ C[i] be a polynomial of degree less than n with coefficients of 
modulus less than 2 r , and letmGC be an arbitrary complex number. Then, for any given 
positive integer L, we can compute an L-bit approximation F m of F m (x) = F(m + x) 
with 

0(n(n + nlogmax(l, |m|) + r + L)) 

bit operations. For this computation, the coefficients of the polynomial F as well as the 
point m have to be approximated to 0(n + nlogmax(l, |m|) + r + L) bits after the binary 
point. 
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